Import the required libraries, this step might take some time
%%time
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
import tensorflow.keras.layers as layers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
%matplotlib inline
Wall time: 9.26 s
## check the tensorflow version that we have
tf.__version__
'2.6.0'
## check if a GPU is available for training
num_gpu = len(tf.config.list_physical_devices('GPU'))
print(f'The number of available GPUs for training = {num_gpu}')
if num_gpu:
print('Training on GPU!')
else:
print('Training on CPU!')
The number of available GPUs for training = 1 Training on GPU!
tensorflow_datasets module¶Since we need some data to train a predictive model on, we can leverage the tensorflow_datasets module and load some of its preprocessed datasets with a few lines of code; specifically, we will be loading the Fashion MNIST dataset from the tensorflow_datasets module.
Here is a breakdown and an explanation for the parameters that we have to set while using the tensorflow_datasets API:
name = the name of the dataset that we would like to load among the datasets available by the tensorflow_datasets modulesplit = splitting the data into training and test records, we can also control the % of the records in each of our splits if we would likedata_dir = the directory in which the data should be downloaded, if the data is available in the directory then it won't be downloaded againas_supervised = a Boolean of whether to include the labels with the features or not, if it's set to True then photos will be represented by Tuples of length = 2, where the first element is the features and the second element is the labelsbatch_size = an Integer that represents the size of each batch of the data, set it to -1 to return all the records in the dataset and then we will manage the batch size during the training process. Setting it to -1 might show a warning message but don't worry about it for now.with_info = a Boolean of whether to include the metadata of this dataset or not, if set to False then only the dataset is returned, if set to True then a Tuple is returned with the structure (Dataset, Metadata)## loading the dataset and its metadata
dataset, metadata = tfds.load(
'fashion_mnist',
split = ['train', 'test'],
data_dir='./data',
as_supervised=True,
batch_size=-1,
with_info=True)
WARNING:tensorflow:From C:\Users\MinaNagib\anaconda3\envs\dl\lib\site-packages\tensorflow_datasets\core\dataset_builder.py:643: get_single_element (from tensorflow.python.data.experimental.ops.get_single_element) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.get_single_element()`.
WARNING:tensorflow:From C:\Users\MinaNagib\anaconda3\envs\dl\lib\site-packages\tensorflow_datasets\core\dataset_builder.py:643: get_single_element (from tensorflow.python.data.experimental.ops.get_single_element) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.get_single_element()`.
The dataset object is a List of 2 Tuples, each Tuple is of length = 2 as well, and here is the logical structure of that dataset object:
[(train_images, train_labels), (test_images, test_labels)]
So, let's unpack these 2 Tuples and convert the data from Tensors to NumPy Arrays
## converting the dataset into numpy arrays and tuple unpacking
## the dataset into training data and testing data
(train_images, train_labels), (test_images, test_labels) = tfds.as_numpy(dataset)
## verify the data is loaded correctly
display(type(train_images), train_images.shape)
display(type(train_labels), train_labels.shape)
display(type(test_images), test_images.shape)
display(type(test_labels), test_labels.shape)
numpy.ndarray
(60000, 28, 28, 1)
numpy.ndarray
(60000,)
numpy.ndarray
(10000, 28, 28, 1)
numpy.ndarray
(10000,)
From the above cell, we can see that our training dataset is composed of 60,000 images, each image is 28 x 28 pixels, while the testing dataset is composed of 10,000 images.
So, you might be wondering, what about the metadata; the metadata holds information about the dataset, this includes its version and the class labels. To access the labels of the classes within this dataset, we can use the following code:
class_names = metadata.features['label'].names
print("Class names: {}".format(class_names))
Class names: ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Let's plot the first 2 images within the training dataset and see how do they look like and the corresponding labels
## load the first 2 images within our training examples
for image, label in list(zip(train_images[:2], train_labels[:2])):
## converting the image from a 3d array of 28 x 28 x 1 shape
## to a 2d array of 28 x 28 shape
image = np.squeeze(image)
## Plotting the images
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.xlabel(class_names[label])
plt.show()
The 28 x 28 pixels images have their pixel values ranging from 0 to 255, we will need to normalize these values to be ranging from 0 to 1, this will help the training to be faster and more stable
## normalizing the data
train_images = train_images / 255
test_images = test_images / 255
## replotting the images to see if there are any changes with the plots
## and to verify the changes
for image, label in list(zip(train_images[:2], train_labels[:2])):
## converting the image from a 3d array of 28 x 28 x 1 shape
## to a 2d array of 28 x 28 shape
image = np.squeeze(image)
## Plotting the images
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.xlabel(class_names[label])
plt.show()
Split the training data into train and validation datasets
from sklearn.model_selection import train_test_split
train_images, valid_images, train_labels, valid_labels = train_test_split(
train_images, train_labels, test_size=0.2, random_state=42, stratify=train_labels)
Time to build our first TensorFlow model, I will leverage the high level Keras API as it makes life much easier. Also, I will break down the code in the cell below here step by step, so feel free to navigate this markdown cell and the code cell below.
Flattening Layer: This converts the inputs from 2d images of 28 x 28 shape to a 1d vector of length = 784Dense Layer: also known as Fully Connected Layer, in our case here this takes the 784 units length vector and outputs 128 values, these 128 values are called Hidden Nodes as this layer is an intermediate layer between the inputs and the outputs in our network, so this layer is also called Hidden Layer in our case here. Keep in mind you can try different values other than the 128 specified here and you can observe the effect of these different values over the model's performanceReLU Activation Function: This has been defined during the creating of the Hidden Layer above, the purpose of adding this function is to implement nonlinearity to our model.Dense Layer: another dense layer that takes the 128 outputs from the previous layer after applying the ReLU Function over them and outputs 10 nodes, these nodes in that form are often referred to as logits. We have to choose 10 nodes in our case here since we have 10 different classes in our problem, so a node for each class.Softmax Function: it takes the logits from the output layer and converts these values in probability distribution, so that makes the output make sense to us, more explanation regarding this probability distribution will be covered within the notebookNote: We need to specify the input_shape, which is the shape of each single record in the dataset, since our dataset is composed of 28 x 28 pixels images, then we should set the input_shape to (28, 28). Although, our data will be fed to the model in the form of batches of shape (batch_size x 28 x 28 x 1), the input shape of (28, 28) or (28, 28, 1) will be fine as that's what is used by the network to initialize the weights, these weights depend mainly on the number of pixels regardless the third extra dimension and so any of the 2 formats will work.
We can ignore passing an input_shape on creating the model, in such a scenario the network will figure out the shape of the data when it starts training; however, this throughs an error with the validation set with its current shape and format. So, a good practice is to pass the input_shape when creating the model.
tf.random.set_seed(42)
np.random.seed(42)
model = keras.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(128, activation=tf.nn.relu),
layers.Dense(10, activation=tf.nn.softmax)])
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 128) 100480 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0 _________________________________________________________________
Once the model's architecture is defined, we need to comile the model. During this step we will need to define a:
Optimizer: a method for updating the model's parameters "weights" after each training iterationLoss Function: also known as Cost Function which measures how far our model's predictions are from the real value. The Optimizer will try to minimize the Loss Function in our case here, this is the training processMetrics: any metrics we would like the model to show during training, this help us evaluate the model's performance as they are more meaningful to us## note: we haven't specified a learning rate here
## so the model will be trained with the default learning rate
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
After defining the model's architecture and combiling the model, it's time to train the model. Training is the process in which the model learns the pattern mapping the inputs to the outputs. To train the model, we use the .fit() method. The parameters defined below are:
x: the input data which represents the features, in our case here these are the imagesy: the corresponding labels for the inputs, these are the classes that each image belong tovalidation_data: a tuple like object, in high level this is a subset of the data that the model doesn't use to train and it isn't used in adjusting the model's Parameters. This subset is used to evaluate the model during training and manually tune the model's Hyperparametersepochs: the number of training iterationsshuffle: a boolean of whether to shuffle the data during training or not before each training iterationbatch_size: the number of training examples that the model is fed before updating its parameters## a helper callback to calculate the loss after each training epoch
eoe_train_loss, eoe_train_accs = [], []
class logging_metrics(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
train_loss, train_accuracy = self.model.evaluate(x=train_images,
y=train_labels,
batch_size=len(train_images),
verbose=0)
eoe_train_loss.append(train_loss)
eoe_train_accs.append(train_accuracy)
log_metrics = logging_metrics()
%%time
history = model.fit(x=train_images,
y=train_labels,
validation_data=(valid_images, valid_labels),
epochs=50,
shuffle=True,
batch_size=64,
callbacks=[logging_metrics()])
Epoch 1/50 750/750 [==============================] - 5s 5ms/step - loss: 0.5429 - accuracy: 0.8133 - val_loss: 0.4734 - val_accuracy: 0.8253 Epoch 2/50 750/750 [==============================] - 3s 4ms/step - loss: 0.4063 - accuracy: 0.8546 - val_loss: 0.3800 - val_accuracy: 0.8630 Epoch 3/50 750/750 [==============================] - 3s 4ms/step - loss: 0.3689 - accuracy: 0.8688 - val_loss: 0.3526 - val_accuracy: 0.8704 Epoch 4/50 750/750 [==============================] - 3s 4ms/step - loss: 0.3398 - accuracy: 0.8771 - val_loss: 0.3516 - val_accuracy: 0.8692 Epoch 5/50 750/750 [==============================] - 3s 4ms/step - loss: 0.3243 - accuracy: 0.8824 - val_loss: 0.3324 - val_accuracy: 0.8734 Epoch 6/50 750/750 [==============================] - 3s 4ms/step - loss: 0.3029 - accuracy: 0.8899 - val_loss: 0.3151 - val_accuracy: 0.8840 Epoch 7/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2907 - accuracy: 0.8939 - val_loss: 0.3243 - val_accuracy: 0.8798 Epoch 8/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2779 - accuracy: 0.8977 - val_loss: 0.3134 - val_accuracy: 0.8871 Epoch 9/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2668 - accuracy: 0.9023 - val_loss: 0.3148 - val_accuracy: 0.8851 Epoch 10/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2591 - accuracy: 0.9041 - val_loss: 0.2999 - val_accuracy: 0.8889 Epoch 11/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2489 - accuracy: 0.9081 - val_loss: 0.3000 - val_accuracy: 0.8917 Epoch 12/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2398 - accuracy: 0.9116 - val_loss: 0.3130 - val_accuracy: 0.8877 Epoch 13/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2306 - accuracy: 0.9152 - val_loss: 0.3158 - val_accuracy: 0.8848 Epoch 14/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2259 - accuracy: 0.9179 - val_loss: 0.2971 - val_accuracy: 0.8903 Epoch 15/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2164 - accuracy: 0.9202 - val_loss: 0.2979 - val_accuracy: 0.8906 Epoch 16/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2124 - accuracy: 0.9209 - val_loss: 0.3079 - val_accuracy: 0.8889 Epoch 17/50 750/750 [==============================] - 3s 4ms/step - loss: 0.2043 - accuracy: 0.9241 - val_loss: 0.2973 - val_accuracy: 0.8938 Epoch 18/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1995 - accuracy: 0.9255 - val_loss: 0.3020 - val_accuracy: 0.8933 Epoch 19/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1938 - accuracy: 0.9281 - val_loss: 0.2883 - val_accuracy: 0.8988 Epoch 20/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1888 - accuracy: 0.9303 - val_loss: 0.3001 - val_accuracy: 0.8948 Epoch 21/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1839 - accuracy: 0.9317 - val_loss: 0.3078 - val_accuracy: 0.8925 Epoch 22/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1774 - accuracy: 0.9353 - val_loss: 0.3049 - val_accuracy: 0.8955 Epoch 23/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1747 - accuracy: 0.9347 - val_loss: 0.2967 - val_accuracy: 0.8989 Epoch 24/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1698 - accuracy: 0.9377 - val_loss: 0.3104 - val_accuracy: 0.8924 Epoch 25/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1665 - accuracy: 0.9379 - val_loss: 0.3045 - val_accuracy: 0.8967 Epoch 26/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1596 - accuracy: 0.9414 - val_loss: 0.3206 - val_accuracy: 0.8905 Epoch 27/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1553 - accuracy: 0.9424 - val_loss: 0.3242 - val_accuracy: 0.8929 Epoch 28/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1545 - accuracy: 0.9421 - val_loss: 0.3024 - val_accuracy: 0.8972 Epoch 29/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1512 - accuracy: 0.9432 - val_loss: 0.3199 - val_accuracy: 0.8960 Epoch 30/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1445 - accuracy: 0.9466 - val_loss: 0.3450 - val_accuracy: 0.8893 Epoch 31/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1445 - accuracy: 0.9469 - val_loss: 0.3248 - val_accuracy: 0.8957 Epoch 32/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1396 - accuracy: 0.9478 - val_loss: 0.3353 - val_accuracy: 0.8974 Epoch 33/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1342 - accuracy: 0.9501 - val_loss: 0.3395 - val_accuracy: 0.8928 Epoch 34/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1335 - accuracy: 0.9509 - val_loss: 0.3387 - val_accuracy: 0.8943 Epoch 35/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1286 - accuracy: 0.9517 - val_loss: 0.3603 - val_accuracy: 0.8918 Epoch 36/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1269 - accuracy: 0.9532 - val_loss: 0.3293 - val_accuracy: 0.8994 Epoch 37/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1257 - accuracy: 0.9526 - val_loss: 0.3598 - val_accuracy: 0.8919 Epoch 38/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1201 - accuracy: 0.9554 - val_loss: 0.3519 - val_accuracy: 0.8955 Epoch 39/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1205 - accuracy: 0.9551 - val_loss: 0.3507 - val_accuracy: 0.8988 Epoch 40/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1159 - accuracy: 0.9582 - val_loss: 0.3669 - val_accuracy: 0.8955 Epoch 41/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1157 - accuracy: 0.9579 - val_loss: 0.3664 - val_accuracy: 0.8936 Epoch 42/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1117 - accuracy: 0.9591 - val_loss: 0.3611 - val_accuracy: 0.8956 Epoch 43/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1101 - accuracy: 0.9595 - val_loss: 0.3698 - val_accuracy: 0.8962 Epoch 44/50 750/750 [==============================] - 3s 4ms/step - loss: 0.1074 - accuracy: 0.9602 - val_loss: 0.3669 - val_accuracy: 0.8944 Epoch 45/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1035 - accuracy: 0.9613 - val_loss: 0.3821 - val_accuracy: 0.8952 Epoch 46/50 750/750 [==============================] - 4s 5ms/step - loss: 0.1007 - accuracy: 0.9630 - val_loss: 0.4011 - val_accuracy: 0.8895 Epoch 47/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1016 - accuracy: 0.9621 - val_loss: 0.3874 - val_accuracy: 0.8958 Epoch 48/50 750/750 [==============================] - 3s 5ms/step - loss: 0.1025 - accuracy: 0.9620 - val_loss: 0.3992 - val_accuracy: 0.8949 Epoch 49/50 750/750 [==============================] - 4s 5ms/step - loss: 0.0949 - accuracy: 0.9650 - val_loss: 0.3922 - val_accuracy: 0.8944 Epoch 50/50 750/750 [==============================] - 4s 5ms/step - loss: 0.0969 - accuracy: 0.9639 - val_loss: 0.3988 - val_accuracy: 0.8950 Wall time: 3min 6s
Once the model completes the training process, it's time to evaluate the model and its predictions. The Keras model object has a history attribute which can be used to load information about the model performance during training
history.history.keys()
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
I will first define 2 helper functions that can be used to plot the graphs for the training history of the model
def plot_single_graph(train_history, valid_history, end_of_epoch_history, metric_name):
## visualizing the loss among different epochs
## using an interactive graph
## creating a figure object
fig = go.Figure()
## adding the training loss or accuracy graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=train_history,
mode='lines',
name=f"Training {metric_name.title()}",
marker={'color':"red"},
hovertemplate = 'Epoch = %{x}<br>'+
f'Training {metric_name.title()} '+'= %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(255,0,0,0.5)'}))
## adding the validation loss or accuracy graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=valid_history,
mode='lines',
name=f"Validation {metric_name.title()}",
marker={'color':"green"},
hovertemplate = 'Epoch = %{x}<br>'+
f'Validation {metric_name.title()} '+'= %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,255,0,0.25)'}))
## adding the end of epoch training loss or accuracy
if end_of_epoch_history:
fig.add_trace(go.Scatter(x0=1, dx=1,
y=end_of_epoch_history,
mode='lines',
name=f"EOE Training {metric_name.title()}",
marker={'color':"blue"},
hovertemplate = 'Epoch = %{x}<br>'+
f'EOE Training {metric_name.title()} '+'= %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,0,255,0.25)'}))
## updating the layout of the graph by adding titles and labels
fig.update_layout(title=f'The {metric_name.lower()} value VS the number of epochs',
xaxis_title='Number of epochs')
## updating the y axis title according to the metric name
if metric_name.lower() == 'loss':
fig.update_layout(yaxis_title='Loss "the lower the better"')
else:
fig.update_layout(yaxis_title='Accuracy "the higher the better"')
## extending the x axis range by 1 from the left and the right
fig.update_xaxes({'range':[0, len(train_history)+1]})
## creating buttons to change the hover behavior
my_buttons = [{'label': "unlinked", 'method': "update", 'args': [{}, {"hovermode": 'closest'}]},
{'label': "linked", 'method': "update", 'args': [{}, {"hovermode": 'x'}]}]
## adding the created buttons to the plot and setting their position
fig.update_layout({
'updatemenus':[{
'type': "buttons",
'direction': 'left',
'pad':{"l": 0, "t": 0},
'active':0,
'x':0,
'xanchor':"left",
'y':1.1,
'yanchor':"top",
'buttons': my_buttons}]})
## showing the final plot
fig.show("notebook")
def plot_double_graph(train_loss, valid_loss, end_of_epoch_loss,
train_accuracy, valid_accuracy, end_of_epoch_accuracy):
## combining the 2 graphs above into 1 graph
## creating a figure object
fig = make_subplots (rows=1, cols=2,
# Set the subplot titles
subplot_titles=['Loss', 'Accuracy'],
# Add spacing between the subplots, ranges from 0 to 1
horizontal_spacing=0.15)
## adding the training loss graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=train_loss,
mode='lines',
name="Training Loss",
marker={'color':"red"},
hovertemplate = 'Epoch = %{x}<br>'+
'Training Loss = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(255,0,0,0.5)'}),
row=1, col=1)
## adding the validation loss graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=valid_loss,
mode='lines',
name="Validation Loss",
marker={'color':"green"},
hovertemplate = 'Epoch = %{x}<br>'+
'Validation Loss = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,255,0,0.25)'}),
row=1, col=1)
## adding the end of epoch loss graph to the plot
if end_of_epoch_loss:
fig.add_trace(go.Scatter(x0=1, dx=1,
y=end_of_epoch_loss,
mode='lines',
name="EOE Training Loss",
marker={'color':"blue"},
hovertemplate = 'Epoch = %{x}<br>'+
'EOE Training Loss = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,0,255,0.25)'}),
row=1, col=1)
## adding the training accuracy graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=train_accuracy,
mode='lines',
name="Training Accuracy",
marker={'color':"red"},
hovertemplate = 'Epoch = %{x}<br>'+
'Training Accuracy = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(255,0,0,0.5)'}),
row=1, col=2)
## adding the validation accuracy graph to the plot
fig.add_trace(go.Scatter(x0=1, dx=1,
y=valid_accuracy,
mode='lines',
name="Validation Accuracy",
marker={'color':"green"},
hovertemplate = 'Epoch = %{x}<br>'+
'Validation Accuracy = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,255,0,0.25)'}),
row=1, col=2)
## adding the end of epoch accuracy graph to the plot
if end_of_epoch_accuracy:
fig.add_trace(go.Scatter(x0=1, dx=1,
y=end_of_epoch_accuracy,
mode='lines',
name="EOE Training Accuracy",
marker={'color':"blue"},
hovertemplate = 'Epoch = %{x}<br>'+
'EOE Training Accuracy = %{y:.4f}<extra></extra>',
hoverlabel={'bgcolor':'rgba(0,0,255,0.25)'}),
row=1, col=2)
## updating the layout of the graph by adding titles and labels
fig.update_layout(xaxis_title='Number of epochs',
xaxis2_title='Number of epochs',
yaxis_title='Loss "the lower the better"',
yaxis2_title='Accuracy "the higher the better"')
## extending the x axis range by 1 from the left and the right
fig.update_xaxes({'range':[0, len(train_loss)+1]})
## creating buttons to change the hover behavior
my_buttons = [{'label': "unlinked", 'method': "update", 'args': [{}, {"hovermode": 'closest'}]},
{'label': "linked", 'method': "update", 'args': [{}, {"hovermode": 'x'}]}]
## adding the created buttons to the plot and setting their position
fig.update_layout({
'updatemenus':[{
'type': "buttons",
'direction': 'left',
'pad':{"l": 0, "t": 0},
'active':0,
'x':0,
'xanchor':"left",
'y':1.2,
'yanchor':"top",
'buttons': my_buttons}]})
## showing the final plot
fig.show("notebook")
plot_single_graph(history.history['loss'],
history.history['val_loss'],
eoe_train_loss,
'loss')
plot_single_graph(history.history['accuracy'],
history.history['val_accuracy'],
eoe_train_accs,
'accuracy')
At first we might be thinking the model is underfitting during the first few epochs, since we can see the model performance over the validation data Green Line is better than the performance over the training data Red Line. However, that's not true since the Red Line is the calculated as a running average and that's why I added a Blue Line which represents the model performance on the training data After each training epoch and not During the training epoch
plot_double_graph(history.history['loss'],
history.history['val_loss'],
eoe_train_loss,
history.history['accuracy'],
history.history['val_accuracy'],
eoe_train_accs)
testing the model's performance over the test set
test_loss, test_accuracy = model.evaluate(x=test_images, y=test_labels, batch_size=100)
print(f'Loss for the test dataset = {test_loss:.4f}')
print(f'Accuracy over the test dataset = {test_accuracy:.4f}')
100/100 [==============================] - 1s 4ms/step - loss: 0.4650 - accuracy: 0.8878 Loss for the test dataset = 0.4650 Accuracy over the test dataset = 0.8878
If you recall, our test set consists of 1,000 images, each image is a numpy array of shape = 28 x 28 x 1, while the label for each image is an integer that corresponds to the index of the class
print('The first image in our test set is:\n')
image = test_images[0]
label = test_labels[0]
plt.figure()
plt.imshow(np.squeeze(image), cmap=plt.cm.binary)
plt.xlabel(class_names[label])
plt.show()
The first image in our test set is:
## verifying the shape of each image and the format of the true label
print(image.shape)
print(label)
(28, 28, 1) 4
Our model expects the inputs to be in the following shape (batch_size x 28 x 28), so if we would like to feed our model a single image, the batch_size in this case will equal to 1, and therefore we will need to add an extra dimension to our image. This can be done using the numpy.expand_dims method, check the cell below to see how this method works
## the first argument is the numpy array
## the second argument is the position of that added extra dimension
np.expand_dims(image, 0).shape
(1, 28, 28, 1)
## feeding the model a single image
predictions = model.predict(np.expand_dims(image, 0))
display(predictions)
display(predictions.shape)
array([[5.4824375e-07, 2.0864061e-19, 1.7313558e-05, 7.3126385e-13,
9.9367440e-01, 1.1622349e-19, 6.3077328e-03, 3.0363608e-13,
3.9505105e-10, 2.6742738e-17]], dtype=float32)
(1, 10)
We notice the output of the network is not a single class name, it's 10 values which correspond to the probability distribution for the 10 classes that we have. The first value "at index = 0" corresponds to the probability that the input has a class label = 0, the second value "at index = 1" corresponds to the probability that the input has a class label = 1 and so on.
From this we can conclude that the sum of the 10 predicted probabilities for any input image should equal 1, we will verify that at the code cell below. For sure we can convert these classes from integers to human readable strings by using the class_names list that we defined earlier using the metadata of our dataset.
## verifying the sum of the probabilites = 1
predictions.sum()
1.0
So what the model did for us when we called the .evaluate method is that it grabbed the index of the class with the highest probability, this is converted to the class with the highest probaility from the models predictions, and then the model compared this predicted class with the true class to calculate the accuracy. Let's walk through what the model did step by step
## checking the class distribution for our test set
np.unique(test_labels, return_counts=True)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),
array([1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000],
dtype=int64))
## getting the predictions for all our test images
predictions = model.predict(test_images)
predictions.shape
(10000, 10)
To grab the index with the highest probability for each input image, we can use the NumPy .argmax() method and specifying the axis = 1
predictions.argmax(axis=1)
array([4, 4, 9, ..., 1, 6, 1], dtype=int64)
Now we will create a Pandas DataFrame that will help us to evaluate our model's predictions further, the DataFrame will have 3 columns, a True_Class column that holds the index of the True Label, a Predicted_Class column that holds the index of the predicted class and an Is_Correct column which is a boolean of whether the model's prediction is correct or not
test_results = pd.DataFrame()
test_results['True_Class'] = test_labels
test_results['Predicted_Class'] = predictions.argmax(axis=1)
test_results['Is_Correct'] = test_results.True_Class == test_results.Predicted_Class
test_results.head()
| True_Class | Predicted_Class | Is_Correct | |
|---|---|---|---|
| 0 | 4 | 4 | True |
| 1 | 4 | 4 | True |
| 2 | 9 | 9 | True |
| 3 | 7 | 7 | True |
| 4 | 5 | 5 | True |
print(f'Accuracy over the test dataset = {test_results.Is_Correct.mean()}')
Accuracy over the test dataset = 0.8878
We notice we got the same accuracy as calculated before using the model.evaluate method. So now we will create a simple report that will help us to further evaluate the model over the different classes in the dataset
test_report = test_results.groupby('True_Class').agg(
Percent_Correct = ('Is_Correct', 'mean'),
Num_Correct = ('Is_Correct', 'sum'))
test_report['Class_Name'] = class_names
test_report
| Percent_Correct | Num_Correct | Class_Name | |
|---|---|---|---|
| True_Class | |||
| 0 | 0.830 | 830 | T-shirt/top |
| 1 | 0.979 | 979 | Trouser |
| 2 | 0.773 | 773 | Pullover |
| 3 | 0.889 | 889 | Dress |
| 4 | 0.854 | 854 | Coat |
| 5 | 0.965 | 965 | Sandal |
| 6 | 0.708 | 708 | Shirt |
| 7 | 0.968 | 968 | Sneaker |
| 8 | 0.961 | 961 | Bag |
| 9 | 0.951 | 951 | Ankle boot |
The steps above can be implemented in the form of a function since we might need to generate that report once again within this notebook
def get_report(model, data, labels, class_names):
## get the model's predictions
predictions = model.predict(data)
## putting the results into a Pandas DataFrame
results = pd.DataFrame()
results['True_Class'] = labels
results['Predicted_Class'] = predictions.argmax(axis=1)
results['Is_Correct'] = results.True_Class == results.Predicted_Class
## aggregating through the DataFrame to build a report
report = results.groupby('True_Class').agg(
Percent_Correct = ('Is_Correct', 'mean'),
Num_Correct = ('Is_Correct', 'sum'),
Num_Total = ('Is_Correct', 'count'))
## adding the class names to the report
report.insert(0, 'Class_Name', class_names)
return report
get_report(model, test_images, test_labels, class_names)
| Class_Name | Percent_Correct | Num_Correct | Num_Total | |
|---|---|---|---|---|
| True_Class | ||||
| 0 | T-shirt/top | 0.830 | 830 | 1000 |
| 1 | Trouser | 0.979 | 979 | 1000 |
| 2 | Pullover | 0.773 | 773 | 1000 |
| 3 | Dress | 0.889 | 889 | 1000 |
| 4 | Coat | 0.854 | 854 | 1000 |
| 5 | Sandal | 0.965 | 965 | 1000 |
| 6 | Shirt | 0.708 | 708 | 1000 |
| 7 | Sneaker | 0.968 | 968 | 1000 |
| 8 | Bag | 0.961 | 961 | 1000 |
| 9 | Ankle boot | 0.951 | 951 | 1000 |
The model is not performing well on Class 6 which corresponds to Shirts, so let's see the model's predictions for some Shirt images.
First, I will introduce a helper function that will be used to plot the images and the probability distribution for the images
The function will take 4 arguments which are:
images: a numpy array with shape = (num_images x 28 x 28)class_names: a list for the class namestrue_labels: a list like for true class labels with length = num_imagesmodel_probs: the model probability predictions with shape = (num_images x num_classes)def plot_images(images, class_names, true_labels, model_probs):
## calculate the number of rows required for our plots
n_rows = int(np.ceil(len(images)/2))
## create the required number of subplots
fig, axs = plt.subplots(nrows=n_rows, ncols=4, figsize=(16,4*n_rows))
## counters to help assign the graph to the axes
row_counter = 0
column_counter = 0
## define the styling colors
correct_color = '#327346'
wrong_color = '#b32520'
default_color = '#2163bf'
for i in range(len(images)):
## assigning the true label and the predicted class to variables
true_label = true_labels[i]
model_predictions = model_probs[i]
predicted_class = model_predictions.argmax()
## specifing which axis are we working on for image plotting
img_ax = axs[row_counter][column_counter]
## removing tick marks
img_ax.set_xticks([])
img_ax.set_yticks([])
## displaying the photo
img_ax.imshow(images[i], cmap=plt.cm.binary)
## setting the text beneath the photo
img_ax.set_xlabel(f'{class_names[true_label]}', color=default_color, fontsize=12)
## incrementing the column to switch to the other axis
column_counter += 1
## specifing which axis are we working on for probability distribution
prob_ax = axs[row_counter][column_counter]
## plotting the probability distribution
prob_ax.bar(x=range(len(class_names)),height=model_predictions, color=default_color)
prob_ax.bar(true_label, height=model_predictions[true_label], color=correct_color)
if true_label == predicted_class:
prob_ax.set_xlabel(f'{class_names[true_label]} {100*model_predictions[true_label]:.2f}%',
color=correct_color, fontsize=12)
else:
prob_ax.bar(predicted_class, height=model_predictions[predicted_class],
color=wrong_color)
prob_ax.set_xlabel(f'{class_names[predicted_class]} {100*model_predictions[predicted_class]:.2f}%',
color=wrong_color, fontsize=12)
## incrementing the row and column counter for the next image
column_counter += 1
if column_counter == 4:
column_counter = 0
row_counter +=1
plt.show()
Let's test the helper function over the first 10 images within the test set
plot_images(images = test_images[:10],
class_names = class_names,
true_labels = test_labels[:10],
model_probs = model.predict(test_images[:10]))
Now, let's grab the indices of the first 20 Shirt images within our test set and use the helper function above to check the True class and the Predicted class for them
shirt_idx = pd.Series(test_labels == 6)[test_labels == 6].index[:20]
plot_images(images = test_images[shirt_idx],
class_names = class_names,
true_labels = test_labels[shirt_idx],
model_probs = model.predict(test_images[shirt_idx]))
From the photos above, we can see that some Shirt photos are mistkaenly classified as other tops like Pullover or T-shirt photos, and even the human eyes can hardly distinguish between these classes especially for photos of low resolution like in our case here. Later we will try other techniques that can help in improving our model's predictions.
In Keras, saving the model is very simple, just use the model.save() method and pass the directory that you would like to save the model at. This saves the model so that you can use it later for predictions without having to train the model from scratch. Also, you can further train the model after loading it if you would like
## saving the model
model.save('./first_keras_model')
INFO:tensorflow:Assets written to: ./first_keras_model\assets
INFO:tensorflow:Assets written to: ./first_keras_model\assets
## loading the model
reloaded_keras_model = keras.models.load_model('./first_keras_model')
reloaded_keras_model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 128) 100480 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0 _________________________________________________________________
## checking if the reloaded model gives the same predictions as the original model
test_loss, test_accuracy = reloaded_keras_model.evaluate(test_images, test_labels, batch_size=100)
print(f'Loss for the test dataset = {test_loss:.4f}')
print(f'Accuracy over the test dataset = {test_accuracy:.4f}')
100/100 [==============================] - 0s 3ms/step - loss: 0.4650 - accuracy: 0.8878 Loss for the test dataset = 0.4650 Accuracy over the test dataset = 0.8878